[spark] support lateral inner join for vector search by Stefanietry · Pull Request #8252 · apache/paimon

Stefanietry · 2026-06-16T08:03:55Z

Purpose
Purpose: Support lateral join for vector search on spark.
Linked issue: #8251

Tests
Add vector search with lateral join on org.apache.paimon.spark.SparkMultimodalITCase#testVector、org.apache.spark.sql.test.SQLTestUtils#test("lateral vector search preserves subquery alias qualifiers")

JingsongLi · 2026-06-16T08:20:27Z

+
+  override protected def doExecute(): RDD[InternalRow] = {
+    child.execute().mapPartitions {
+      outerRows =>


Can batch queries be supported? Batch queries are crucial for performance. You can take a look to benchmark in https://github.com/apache/paimon-vector-index

Thanks for your reminder. I'll refine it in batch mode later.

JingsongLi · 2026-06-16T15:11:23Z

Please fix test failures.

JingsongLi · 2026-06-23T06:05:53Z

+            _.toPaimonDataField)).asJava)
+      }
+    val sparkRow = SparkInternalRow.create(resultRowType)
+    val vectorSearchBuilder = innerTable


Normal Spark vector_search scans apply pushed partition/data filters before top-k (PaimonBaseScan.evalVectorSearch passes pushedPartitionFilters/pushedDataFilters into the builder). This lateral executor builds the BatchVectorSearchBuilder here without carrying predicates from the search side; PushDownLateralVectorSearchFilter only pushes predicates that reference the left child, so predicates on r.dt or other searched-table columns stay above LateralVectorSearch. A query like ... JOIN LATERAL (...) r WHERE r.dt = '20260420' will pick topK over all partitions and then filter the joined rows, which can return fewer or wrong rows compared with non-lateral vector_search(...) WHERE dt = .... Please preserve search-side filters and apply them via withPartitionFilter/withFilter before newVectorScan()/readBatch(), or reject such predicates explicitly.

JingsongLi · 2026-06-23T06:06:07Z

+
+    scan.plan().splits().asScala.iterator.flatMap {
+      split =>
+        val reader =


This reader is only closed when this inner iterator is fully exhausted. If a downstream operator short-circuits consumption, for example LIMIT 1/take, or if the task is interrupted, Spark can stop pulling rows before hasNext returns false, leaving the current PaimonRecordReaderIterator and its underlying RecordReader open. Please register a TaskContext completion listener or wrap the returned iterator so the current reader is closed on task completion/cancellation as well as normal exhaustion.

Stefanietry force-pushed the support_lateral_join_for_vector_search branch from 8aa9c09 to 774c9b6 Compare June 16, 2026 08:05

JingsongLi reviewed Jun 16, 2026

View reviewed changes

Stefanietry force-pushed the support_lateral_join_for_vector_search branch 2 times, most recently from 4697f65 to c23c76b Compare June 22, 2026 15:04

Stefanietry closed this Jun 22, 2026

Stefanietry reopened this Jun 22, 2026

Stefanietry force-pushed the support_lateral_join_for_vector_search branch 2 times, most recently from a1e3745 to 835b339 Compare June 23, 2026 05:38

JingsongLi reviewed Jun 23, 2026

View reviewed changes

Stefanietry force-pushed the support_lateral_join_for_vector_search branch from 835b339 to 07428ac Compare June 23, 2026 08:00

[spark] support lateral inner join for vector search

c9ee0dd

Stefanietry force-pushed the support_lateral_join_for_vector_search branch from 07428ac to c9ee0dd Compare June 23, 2026 11:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] support lateral inner join for vector search#8252

[spark] support lateral inner join for vector search#8252
Stefanietry wants to merge 1 commit into
apache:masterfrom
Stefanietry:support_lateral_join_for_vector_search

Stefanietry commented Jun 16, 2026

Uh oh!

JingsongLi Jun 16, 2026

Uh oh!

Stefanietry Jun 16, 2026

Uh oh!

JingsongLi commented Jun 16, 2026

Uh oh!

JingsongLi Jun 23, 2026

Uh oh!

JingsongLi Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Stefanietry commented Jun 16, 2026

Uh oh!

JingsongLi Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Stefanietry Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi commented Jun 16, 2026

Uh oh!

JingsongLi Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

JingsongLi Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants